## 'data.frame': 1599 obs. of 12 variables:
## $ fixed.acidity : num 7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
## $ volatile.acidity : num 0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
## $ citric.acid : num 0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
## $ residual.sugar : num 1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
## $ chlorides : num 0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
## $ free.sulfur.dioxide : num 11 25 15 17 11 13 15 15 9 17 ...
## $ total.sulfur.dioxide: num 34 67 54 60 34 40 59 21 18 102 ...
## $ density : num 0.998 0.997 0.997 0.998 0.998 ...
## $ pH : num 3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 ...
## $ sulphates : num 0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
## $ alcohol : num 9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
## $ quality : int 5 5 5 6 5 5 5 7 7 5 ...
## fixed.acidity volatile.acidity citric.acid residual.sugar
## Min. : 4.60 Min. :0.1200 Min. :0.000 Min. : 0.900
## 1st Qu.: 7.10 1st Qu.:0.3900 1st Qu.:0.090 1st Qu.: 1.900
## Median : 7.90 Median :0.5200 Median :0.260 Median : 2.200
## Mean : 8.32 Mean :0.5278 Mean :0.271 Mean : 2.539
## 3rd Qu.: 9.20 3rd Qu.:0.6400 3rd Qu.:0.420 3rd Qu.: 2.600
## Max. :15.90 Max. :1.5800 Max. :1.000 Max. :15.500
## chlorides free.sulfur.dioxide total.sulfur.dioxide
## Min. :0.01200 Min. : 1.00 Min. : 6.00
## 1st Qu.:0.07000 1st Qu.: 7.00 1st Qu.: 22.00
## Median :0.07900 Median :14.00 Median : 38.00
## Mean :0.08747 Mean :15.87 Mean : 46.47
## 3rd Qu.:0.09000 3rd Qu.:21.00 3rd Qu.: 62.00
## Max. :0.61100 Max. :72.00 Max. :289.00
## density pH sulphates alcohol
## Min. :0.9901 Min. :2.740 Min. :0.3300 Min. : 8.40
## 1st Qu.:0.9956 1st Qu.:3.210 1st Qu.:0.5500 1st Qu.: 9.50
## Median :0.9968 Median :3.310 Median :0.6200 Median :10.20
## Mean :0.9967 Mean :3.311 Mean :0.6581 Mean :10.42
## 3rd Qu.:0.9978 3rd Qu.:3.400 3rd Qu.:0.7300 3rd Qu.:11.10
## Max. :1.0037 Max. :4.010 Max. :2.0000 Max. :14.90
## quality
## Min. :3.000
## 1st Qu.:5.000
## Median :6.000
## Mean :5.636
## 3rd Qu.:6.000
## Max. :8.000
##
## 3 4 5 6 7 8
## 10 53 681 638 199 18
## fixed.acidity volatile.acidity citric.acid residual.sugar
## Min. : 4.600 Min. :0.1200 Min. :0.0000 Min. :0.900
## 1st Qu.: 7.100 1st Qu.:0.3900 1st Qu.:0.0900 1st Qu.:1.900
## Median : 7.900 Median :0.5200 Median :0.2500 Median :2.200
## Mean : 8.258 Mean :0.5287 Mean :0.2661 Mean :2.409
## 3rd Qu.: 9.100 3rd Qu.:0.6400 3rd Qu.:0.4200 3rd Qu.:2.600
## Max. :13.200 Max. :1.5800 Max. :1.0000 Max. :8.300
## chlorides free.sulfur.dioxide total.sulfur.dioxide
## Min. :0.01200 Min. : 1.00 Min. : 6.00
## 1st Qu.:0.07000 1st Qu.: 7.00 1st Qu.: 21.00
## Median :0.07900 Median :13.00 Median : 37.00
## Mean :0.08701 Mean :15.18 Mean : 44.38
## 3rd Qu.:0.09000 3rd Qu.:21.00 3rd Qu.: 60.00
## Max. :0.61100 Max. :47.00 Max. :143.00
## density pH sulphates alcohol quality
## Min. :0.9901 Min. :2.740 Min. :0.3300 Min. : 8.40 3: 10
## 1st Qu.:0.9956 1st Qu.:3.210 1st Qu.:0.5500 1st Qu.: 9.50 4: 52
## Median :0.9967 Median :3.310 Median :0.6200 Median :10.20 5:648
## Mean :0.9967 Mean :3.316 Mean :0.6573 Mean :10.43 6:614
## 3rd Qu.:0.9978 3rd Qu.:3.400 3rd Qu.:0.7300 3rd Qu.:11.10 7:190
## Max. :1.0029 Max. :4.010 Max. :2.0000 Max. :14.00 8: 18
## 'data.frame': 1532 obs. of 12 variables:
## $ fixed.acidity : num 7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
## $ volatile.acidity : num 0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
## $ citric.acid : num 0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
## $ residual.sugar : num 1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
## $ chlorides : num 0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
## $ free.sulfur.dioxide : num 11 25 15 17 11 13 15 15 9 17 ...
## $ total.sulfur.dioxide: num 34 67 54 60 34 40 59 21 18 102 ...
## $ density : num 0.998 0.997 0.997 0.998 0.998 ...
## $ pH : num 3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 ...
## $ sulphates : num 0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
## $ alcohol : num 9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
## $ quality : Ord.factor w/ 6 levels "3"<"4"<"5"<"6"<..: 3 3 3 4 3 3 3 5 5 3 ...
To remove the impact of outliers, the top 1% from fixed acidity, residual sugar, free sulfur dioxide, and sulfur dioxide.
For this univariate section, we will create histograms to observe the distribution of each variable.
The Total Sulfur Dioxide, Free Sulfur Dioxide, and Sulphates variables show a long tail. This is why we will apply a log10 transform to produce a normal distribution. The variance does decreases in both cases, espcially for the sulfur distribution
## nbr.val nbr.null nbr.na min max
## 1.532000e+03 0.000000e+00 0.000000e+00 6.000000e+00 1.430000e+02
## range sum median mean SE.mean
## 1.370000e+02 6.799100e+04 3.700000e+01 4.438055e+01 7.572701e-01
## CI.mean.0.95 var std.dev coef.var
## 1.485396e+00 8.785376e+02 2.964014e+01 6.678632e-01
## nbr.val nbr.null nbr.na min max
## 1.532000e+03 0.000000e+00 0.000000e+00 7.781513e-01 2.155336e+00
## range sum median mean SE.mean
## 1.377185e+00 2.374926e+03 1.568202e+00 1.550212e+00 7.621960e-03
## CI.mean.0.95 var std.dev coef.var
## 1.495059e-02 8.900043e-02 2.983294e-01 1.924442e-01
## nbr.val nbr.null nbr.na min max
## 1.532000e+03 0.000000e+00 0.000000e+00 3.300000e-01 2.000000e+00
## range sum median mean SE.mean
## 1.670000e+00 1.006990e+03 6.200000e-01 6.573042e-01 4.349794e-03
## CI.mean.0.95 var std.dev coef.var
## 8.532185e-03 2.898652e-02 1.702543e-01 2.590190e-01
## nbr.val nbr.null nbr.na min max
## 1.532000e+03 1.000000e+00 0.000000e+00 -4.814861e-01 3.010300e-01
## range sum median mean SE.mean
## 7.825161e-01 -2.971731e+02 -2.076083e-01 -1.939772e-01 2.481604e-03
## CI.mean.0.95 var std.dev coef.var
## 4.867703e-03 9.434607e-03 9.713191e-02 -5.007388e-01
The acidity (fixed and volatile) show a long tail. This is why we will apply a log10 transform to produce a normal distribution. The variance in fixed acidity deceased but did not decease for volatile acidity.
## nbr.val nbr.null nbr.na min max
## 1.532000e+03 0.000000e+00 0.000000e+00 4.600000e+00 1.320000e+01
## range sum median mean SE.mean
## 8.600000e+00 1.265170e+04 7.900000e+00 8.258290e+00 4.172367e-02
## CI.mean.0.95 var std.dev coef.var
## 8.184160e-02 2.667005e+00 1.633097e+00 1.977524e-01
## nbr.val nbr.null nbr.na min max
## 1.532000e+03 0.000000e+00 0.000000e+00 6.627578e-01 1.120574e+00
## range sum median mean SE.mean
## 4.578161e-01 1.392269e+03 8.976271e-01 9.087915e-01 2.126268e-03
## CI.mean.0.95 var std.dev coef.var
## 4.170705e-03 6.926194e-03 8.322376e-02 9.157629e-02
## nbr.val nbr.null nbr.na min max
## 1.532000e+03 0.000000e+00 0.000000e+00 1.200000e-01 1.580000e+00
## range sum median mean SE.mean
## 1.460000e+00 8.099800e+02 5.200000e-01 5.287076e-01 4.540049e-03
## CI.mean.0.95 var std.dev coef.var
## 8.905373e-03 3.157766e-02 1.777010e-01 3.361046e-01
## nbr.val nbr.null nbr.na min max
## 1.532000e+03 3.000000e+00 0.000000e+00 -9.208188e-01 1.986571e-01
## range sum median mean SE.mean
## 1.119476e+00 -4.629048e+02 -2.839967e-01 -3.021572e-01 3.887968e-03
## CI.mean.0.95 var std.dev coef.var
## 7.626306e-03 2.315816e-02 1.521781e-01 -5.036387e-01
Wine quality is seperated into 3 bins; bad, average, and excellent. This is help classify wine quality. The histogram below shows that most wines fall into the average category.
## bad average excellent
## 62 1262 208
After removing the top 1% outliers from sulfur variables, sugar, and acidity varaibles, 1532 observations were left.
The main feature for me is learning what makes wine excellent or bad. I would like to eventually use machine learning to predict what wines people will like given past preferences.
I think that people have subjective opinions about the quality of wine due to their individual palette tastes and preferences. I would think that sugar, acidity, and alcohol could be clustered into wine quality. So people who prefer sweet, sour, and bitter could possibly be grouped together and grade wine quality similarily.
I created a categorical variable from the wine quality column where 5 or lower was assigned bad, 5-6 was assigned average, and 7 or higher was assigned excellent.
I cleaned up the data by removing the outliers(removing the top 1%) from sulfur variables, sugar, and acidity. Then I used a log10 transform on the sulphate, acidity, and sulfur data for turn them into normal distributions.
To get a bird’s eye view of the dataset we will do a correlation plot between all the wine factors. This is visual show us how our variables are interacting from a bivariate standpoint.
The correlation coefficient for pH and fixed acidity is -0.67, meaning that pH tends to drop at fixed acidity increases, which makes sense, because a lower pH number does means that the substance is more acidic on the pH scale.
## [1] -0.6796961
The correlation between citric acid and pH is weaker(-0.52) than that of fixed acidity and pH. This makes sense because citric acid is a subset of fixed acidity.
## [1] -0.5277259
Acetic acid is volatile acid, which has a positive correlation with pH of 0.23. Volatile acid is gaseous and evaporate as the wine bottle remains open. This is what wine connoisseurs call airing, which allows the wine to breath. While the wine is airing, the pH level will increase, because the acidity is deceasing. However, the time that the wine was expose to air is unknown to these dataset. It would be interesting to see how airing time varies with pH.
## [1] 0.2385135
I am most interesting in seeing what variables affect wine ratings. By binning the ratings into bad, average, and excellent, we can classify wines by type and explore any correlations. Upon binning wine rating, we will explore how alcohol, pH, volatile acidity, citric acid, and sulphates affect the binned wine rating.
From the pH vs. wine rating boxplot shown below, we can see that on average, excellent rated wines have a lower pH value compared to bad rated wines. However, the difference between excellent and bad is small and difficult to say whether significant. Another thing to note is that the average and excellent ratings have similar distributions. The excellent distribution is within the average rating distribution. To improve the comparison, we could create smaller wine rating bins and/or survey more wines.
## red_wine$rating: bad
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.740 3.303 3.380 3.385 3.500 3.900
## --------------------------------------------------------
## red_wine$rating: average
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.870 3.210 3.315 3.316 3.408 4.010
## --------------------------------------------------------
## red_wine$rating: excellent
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.880 3.200 3.280 3.295 3.380 3.780
From the alcohol percentage vs rating boxplot, we can observe a much greater difference between excellent and bad rating wines. Excellent rated wines have higher alcohol percentages. Bad and average rated wines are similar in alcohol percentages. The alcohol mean of excellent wines is 11.6% compared to that of bad which is 10.2%. It is also important to note that the entire excellent rated wine distribution is visually higher than the bad rated wine distribution.
## red_wine$rating: bad
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.40 9.60 10.00 10.20 10.97 13.10
## --------------------------------------------------------
## red_wine$rating: average
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.50 9.50 10.00 10.26 10.90 14.00
## --------------------------------------------------------
## red_wine$rating: excellent
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 9.50 10.80 11.60 11.54 12.22 14.00
From the volatile acidity percentage vs rating boxplot, we can observe a much greater difference between excellent and bad rating wines. Excellent rated wines have lower volatile acidity percentages. Bad and average rated wines were not similar in volatile percentages. The trend shows that the lower the volatile acidity, the better the rating. This suggests that the wines should be throughly aired to allow the acetic acid to evaporate, in turn increases the pH value and rating. It would be interesting to see whether there are dimishing returns on quality if the wine is left to air out until no volatile acid is left.
## red_wine$rating: bad
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.2300 0.5800 0.6800 0.7306 0.8838 1.5800
## --------------------------------------------------------
## red_wine$rating: average
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1600 0.4100 0.5400 0.5385 0.6400 1.3300
## --------------------------------------------------------
## red_wine$rating: excellent
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1200 0.3100 0.3700 0.4090 0.4925 0.9150
From the Citric Acid percentage vs Rating boxplot, we can observe a much greater difference between excellent and bad rating wines. Excellent rated wines have higher citric percentages. Bad and average rated wines were not similar in mean citric percentages. The trend shows that the higher the citric acidity, the better the rating. It would be interesting to see whether there are dimishing returns on quality at a certain citric acid percentage.
## red_wine$rating: bad
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0200 0.0750 0.1713 0.2675 1.0000
## --------------------------------------------------------
## red_wine$rating: average
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0900 0.2400 0.2539 0.4000 0.7600
## --------------------------------------------------------
## red_wine$rating: excellent
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.3000 0.3950 0.3687 0.4900 0.7600
From the Sulphates percentage vs Rating boxplot, we can observe a much greater difference between excellent and bad rating wines. Excellent rated wines have higher Sulphates percentages. Bad and average rated wines were not similar in mean Sulphates percentages. The trend shows that the higher the Sulphates, the better the rating. It would be interesting to see whether there are dimishing returns on quality at a certain Sulphates percentage.
## red_wine$rating: bad
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.3300 0.4925 0.5600 0.5927 0.6000 2.0000
## --------------------------------------------------------
## red_wine$rating: average
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.3700 0.5400 0.6100 0.6461 0.7000 1.9800
## --------------------------------------------------------
## red_wine$rating: excellent
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.3900 0.6500 0.7400 0.7444 0.8200 1.3600
I was interested in what variables affect wine quality. It seemed to be that alcohol, volatile acidity, citric acid, and sulphates had the greatest affect on wine rating. In general, here were my findings:
Alcohol: The alcohol mean of excellent wines is 11.6% compared to that of bad which is 10.2%. It is also important to note that the entire excellent rated wine distribution is visually higher than the bad rated wine distribution.
Volatile Acidity: The trend shows that the lower the volatile acidity, the better the rating. This suggests that the wines should be throughly aired to allow the acetic acid to evaporate, in turn increases the pH value and rating. The mean volatile acidity for excellent rated wines was 0.409.
Citric Acid: The trend shows that the higher the citric acidity, the better the rating. The mean citric acidity for excellent rated wines was 0.490.
Sulphates: I can observe a much greater difference between excellent and bad rating wines. Excellent rated wines have higher Sulphates percentages. Bad and average rated wines were not similar in mean Sulphates percentages. The trend shows that the higher the Sulphates, the better the rating. The mean sulphate content for excellent rated wines was 0.820
Fixed acidity and pH were negatively correlated, because a lower pH does mean more acidic.
The strongest relationship was with alcohol. Alcohol to wine quality had a correlation value of 0.48783279.
Upon completion of the bivariate analysis, it seems that the variables that affect wine rating are sulphates, citric acid, volatile acid, alchol, and pH. In this Multivariate section, we will compare the interaction of these factors on wine rating.
Right away, we can notice that there are a lot of more average rated wines that overlap when compared to bad and excellent rated wines. This is similar to what we saw in the boxplots. However, when comparing bad and excellent rated wines, we notice that all bad rated wines were less than ~12% alcohol and are around -.25log10(sulphates) or less. Whereas, many excellent rated wines had greater than 12% alcohol and are within -.25log10(sulphates) and 0. This comparison shows us how a higher alcohol content (>12%) and a sulphates content within -.25log10(sulphates) and 0 can result in average and excellent wine ratings. However, these 2 varaiables don’t adequetly differentiate between average and excellent ratings.
Again, we can notice that there are a lot of more average rated wines that overlap when compared to bad and excellent rated wines. However, when comparing bad and excellent rated wines, we notice that all bad rated wines were less than ~12% alcohol and are between 0-0.5 citric acid. Whereas, many excellent rated wines had greater than 12% alcohol and are within 0-0.75 citric acid. However, these 2 varaiables don’t adequetly differentiate between average and excellent ratings. I would also that that we dont have as many bad rated wine data points. Perhaps, if we had more than the data would look similar between all wine ratings.
It could be that the alcohol isn’t a big enough factor to visual see difference between ratings. So we will try sulphate and citric acid on ratings in the facet plot below.
We can notice that there are a lot of more average rated wines that overlap when compared to bad and excellent rated wines. However, when comparing bad and excellent rated wines, we notice that all bad rated wines were less than ~0.25 sulphates and less than 0.5 citric acid. Whereas, many excellent rated wines have -0.25 to 0 sulphates alcohol and less than 0.75 citric acid. However, these 2 varaiables don’t adequetly differentiate between average and excellent ratings.
## red_wine$rating: bad
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.4815 -0.3076 -0.2518 -0.2455 -0.2218 0.3010
## --------------------------------------------------------
## red_wine$rating: average
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.4318 -0.2676 -0.2147 -0.2011 -0.1549 0.2967
## --------------------------------------------------------
## red_wine$rating: excellent
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.40894 -0.18709 -0.13077 -0.13514 -0.08619 0.13354
Next, we try volatile acidity and citric acid. We notice here that excellent rated wine have a citric acid content higher than ~0.3 and a volatile acid content of less than ~0.5.
## red_wine$rating: bad
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.2300 0.5800 0.6800 0.7306 0.8838 1.5800
## --------------------------------------------------------
## red_wine$rating: average
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1600 0.4100 0.5400 0.5385 0.6400 1.3300
## --------------------------------------------------------
## red_wine$rating: excellent
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1200 0.3100 0.3700 0.4090 0.4925 0.9150
Here is see again the effects of volatile acidity. Lower volatile and higher sulphates trend toward an excellent rating.
## red_wine$rating: bad
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.2300 0.5800 0.6800 0.7306 0.8838 1.5800
## --------------------------------------------------------
## red_wine$rating: average
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1600 0.4100 0.5400 0.5385 0.6400 1.3300
## --------------------------------------------------------
## red_wine$rating: excellent
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1200 0.3100 0.3700 0.4090 0.4925 0.9150
In the multivariate section we can now see how variables react together to get wine ratings. This would be interesting in order to build models. Here is summarized the rectangles I put around excellent and bad wine ratings:
Alcohol vs. Sulphates Excellent: Sulphates -0.25 to 0 and alcohol 10 - 13 Bad: Sulphates -0.375 to 0.125 and alcohol 9 - 12
Alcohol vs. Citric Acid Excellent: Citric 0 to .75 and alcohol 9 - 14 Bad: Citric 0 to 0.5 and alcohol 9 - 12
Citric Acid vs. Sulphates Excellent: Sulphates -0.25 to .0 and citric acid 0 - 0.75 Bad: Sulphates -0.375 to -0.125 and citric acid 0 - 0.5
Sulphates vs. Volatile Acidity Excellent: Sulphates -0.25 to .0 and Volatile acid 0.6 to 0.8 Bad: Sulphates -0.375 to -0.125 and Volatile acid 0.4 - 01.2
When I compare the max and min values of alcohol, sulphate, and citric acid, it seems to be that sulphates actually narrow the acceptable band of alcohol content to give an excellent rating. For example:
To get an excellent rating given the abdn of sulphates, alcohol had to be between 10-13. However with Citric Acid, the excellent alcohol badn was learger 9 to 14
Alcohol vs. Sulphates Excellent: Sulphates -0.25 to 0 and alcohol 10 - 13 Bad: Sulphates -0.375 to 0.125 and alcohol 9 - 12
Alcohol vs. Citric Acid Excellent: Citric 0 to .75 and alcohol 9 - 14 Bad: Citric 0 to 0.5 and alcohol 9 - 12
## red_wine$rating: bad
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.3300 0.4925 0.5600 0.5927 0.6000 2.0000
## --------------------------------------------------------
## red_wine$rating: average
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.3700 0.5400 0.6100 0.6461 0.7000 1.9800
## --------------------------------------------------------
## red_wine$rating: excellent
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.3900 0.6500 0.7400 0.7444 0.8200 1.3600
Excellent wines tend to have higher sulphate content.T he trend shows that the higher the Sulphates, the better the rating.
The maximum and minimum are shown below for excellent and bad rated wines: Alcohol vs. Sulphates Excellent: Sulphates -0.25 to 0 and alcohol 10 - 13 Bad: Sulphates -0.375 to 0.125 and alcohol 9 - 12
In general, the higher the sulphates and alcohol content, the better the rating.
The density graph shows exactly where we can find excellent quality wine with respect to volatile acidity. There is no chance of a wine being good if it has more than 1 g/dm^3 volatile acidity.
I personnally through the time it took to read up on wine background, create all the plots, and then discuss. The entire project was quite large and took a lot of time. But I feel more confident in my knowledge about wine.
I liked this project because it taught me about wine and now I understand why it is important to air out the wine in order to remove the volatile acidity.
I would collect more data. There wasn’t enough bad points. It may have been because the people tasting the wines weren’t that experienced and so they just put average for wines that would have been bad otherwise. It would also be nice to know how long wines were aired out for and where the grapes were grown and where they were processed. Upon taking the machine learning course, I will build a predictive model.